CSSS/SOC/STAT 321: Lab 5

Descriptive Statistics: Two Variables

Tao Lin

Office Hours: Fri 1:30 - 3:30 PM Smith 35

Section Slides URL: soxv/CSSS-321-Labs

Goals for Week 5

  • QSS Tutorial 4
  • Key Question in Week 5
    • What is selection bias in survey sampling? How can we solve this problem?
    • How can we describe the relationship between two variables?
  • Deciphering Problem Set 2

Survey Sampling

Agenda

  • Survey Sampling

  • Correlation

  • Quantile-Quantile Plot

  • Deciphering Problem Set 2

Source: Groves et al. 2009

  • Measurement
    • Construct validity: the extent to which the measure is related to the underlying construct/concept
    • Measurement error:
      • e.g. underreporting of sensitive behaviors
    • Processing error: i.e. data entry error
  • Representation: we want our estimate of certain variable to be as representative (i.e. externally valid) as possible.
    • Coverage error: the nonobservational gap between the target population and the sampling frame
      • e.g. sampling through phone calls can make the resulting sample less representative.
    • Sampling error: non observational gap between the sampling frame and the sample
      • e.g. some members of the sampling frame are given smaller chance of selection
    • Non-response error
      • Unit non-response
      • Item non-response
    • Adjustment error

Correlation

Agenda

  • Survey Sampling

  • Correlation

  • Quantile-Quantile Plot

  • Deciphering Problem Set 2

\begin{aligned} \text{Corr}(x, y) =& \frac{\text{Cov}(x, y)}{\sigma_x \sigma_y} \\ =& \frac{\frac{1}{n-1}\sum_{i=1}^n [(x_i - \bar{x}) (y_i - \bar{y})]}{\sigma_x \sigma_y} \\ =& \frac{1}{n-1}\sum_{i=1}^n [\text{z-score}(x) \times \text{z-score}(y)] \end{aligned}

  • Covariance: To what extent the two variables covary on average.
  • Correlation: covariance of the two variables rescaled by their standard deviations.

Quantile-Quantile Plot

Agenda

  • Survey Sampling

  • Correlation

  • Quantile-Quantile Plot

  • Deciphering Problem Set 2

Deciphering Problem Set 2

Agenda

  • Survey Sampling

  • Correlation

  • Quantile-Quantile Plot

  • Deciphering Problem Set 2

Name Description
name The judge’s name
child The number of children each judge has.
circuit.1 Which federal circuit the judge serves in.
girls The number of female children the judge has.
progressive.vote The proportion of the judge’s votes on women’s issues which were decided in a pro-feminist direction.
race The judge’s race (1 = white, 2 = African-American, 3 = Hispanic, 4 = Asian-American).
religion The judge’s religion (1 = Unitarian, 2 = Episcopalian, 3 = Baptist, 4 = Catholic, 5 = Jewish, 7 = Presbyterian, 8 = Protestant, 9 = Congregationalist, 10 = Methodist, 11 = Church of Christ, 16 = Baha’i, 17 = Mormon, 21 = Anglican, 24 = Lutheran, 99 = unknown).
republican Takes a value of 1 if the judge was appointed by a Republican president, 0 otherwise. Used as a proxy for the judge’s party.
sons The number of male children the judge has.
woman Takes a value of 1 if the judge is a woman, 0 otherwise.
X Indicator for the observation number.
yearb The year the judge was born.

Overview

  • Research Question: Does having daughters cause judges to vote in a pro-feminist direction?
    • Study type: observational studies
    • Explanatory variable: whether a judge has at least one daughter.
    • Outcome variable: progressive.vote - The proportion of the judge’s votes on women’s issues which were decided in a pro-feminist direction.
  • Potential Confounders
    • a judge’s gender (Q2)
    • a judge’s party identification (Q2 & Q3)
    • whether a judge has at least one child (Q3)
    • the number of children (Q4)
    • other confounders? Can we assess them in the existing data? (Q5)
  • The goal of each question
    • Q1: summary statistics and check balance
    • Q2: association between progressive.vote and two confounders - republican and woman
    • Q3: association between progressive.vote and two confounders - whether a judge has at least one child and republican
    • Q4: association between progressive.vote and explanatory variable - whether a judge has at least one daughter, conditional on the total number of children
      • Considering confounding bias and covariate balance, should we include judges who have no child?
    • Q5: assess the plausibility of unconfoundedness assumption
      • Is the number of children correlated with both children’s gender and progressive.vote?
      • Is any other pre-treatment variables potentially correlated with both children’s gender and progressive.vote?

Question 1

  • The total number of judges; gender composition and party composition
  • Party composition between male and female judges (Hint: create a contingency table showing proportion)
  • Range of the outcome variable progressive.vote.
  • Interpret the result.

Question 2

  • Create a boxplot with one single command to compare the distribution of progressive.vote across 4 groups: Republican men, Republican women, Democratic men, Democratic women. (Hint: y ~ x1 + x2 in boxplot())
  • Interpret the result.

Question 3

  • Create a binary variable that denotes if a judge has at least one child.
  • Compare the distribution of this binary variable between Republican and Democratic judges. (Hint: difference in means)
  • Compute the difference in means of progressive.vote between judges who have at least one child and those who don’t.
  • Compute the difference in means of progressive.vote between Republican and Democratic parents.
    • Hint: compute difference in means for each group, or use tapply(..., list(..., ...), mean)
    • e.g. Republican parents: judges who are republican and who have at least one child

Question 4

  • Compute difference in means of progressive.vote between judges who have at least one daughter and those who don’t have any.
  • Repeat the same computation across judges with different number of children.
    • only focus on judges with 1, 2, and 3 children.
    • Hint: compute difference in means for each group, or use tapply(..., list(..., ...), mean)
  • What assumption do we need to interpret previous results?

Question 5

Conditional on the number of children, the number of daughters a judge has is random. How can we evaluate the validity of this assumption?

  • Compare the means of girls across judges with different number of children.
  • Compare the means of girls across judges, divided by other potential confounders in the data.